Project
Title
Description
Hardware
Software
Electrical component themed AI detection and identification.
A mounted camera above a surface (part of the product)
Produces a controlled environment live feed for the application
An application running inference on a live USB camera feed (optionally imported
picture or video)
Application
Modification of the provided data to simulate differences in the environment and to
provide imperfections to train against
Augmentation examples
Addition of glare
Rotation
Blurring
Addition of spots
GUI
The Application is based on Qt Creator, using C++
Inference
Running on C++
Utilising
Ultralytics YOLOv8
Summary
Detection via Inference
Detect and display boundaries for each identified class from the input image using
Inference.
Identifcation
Post-processing of the components in the bounding boxes detected by inference,
which may have additional information that can be identified by a variety of
approaches.
Examples
LEDs
Resistors
Resistor code value
LED color
Technology
AI based Electrical Component Identifier
IC Components
Pin count
Information written on the component
Features
Inference
Classes to train the model to detect:
Resistor
Diode
Capacitor
LEDs
Integrated Circuits
AC
DC
LDR
Milestones
Base camera rig
Initial inference model training
Inference running
Testing with video footage from a mobile device
Research
Models
Ultralytics YOLO
Live Labeling
Focus Audience
Set Rig
The set position of the camera, a significant reduction in distance between the objects,
and significant consistency of the lighting provided by the ring light, and the static
background - will boost the confidence of the inference considerably.
Training
Running
Post-processing
Rationale
Timeline
Gantt Chart
Live training
YOLOv5
Training
1st batch, test run
Image Count
100
Classes (1 Total)
Resistor
2nd batch
Image Count
Training
Testing
20
Training
Testing
1800
540
3rd batch
Image Count
Training
Testing
2393
724
Classes (9 Total)
red_led
green_led
blue_led
yellow_led
ac_capacitor
dc_capacitor
resistor
sip_resistor
pcb_terminal
Classes (10 Total)
red_led
green_led
blue_led
yellow_led
ac_capacitor
dc_capacitor
resistor
sip_resistor
pcb_terminal
metal_nut
Augmentation
Default
Augmentation
Default
Average time per epoch
34 seconds
Epoch count
400
Epoch count
250
Epoch count
300
Augmentation
Default
YOLOv8
Due to the angle and lighting both being known and mostly set thanks to using a set
rig, the input dataset does not need to cover angles and lighting outside what the rig
will expose it to during runtime.
The sum of all the points covered above results in a significant reduction in data
required to train when compared to a setup without a set rig, for equivelant confidence
values during runtime.
The angle range is reduced only to looking from top to down, eliminating the rest of
the angle range.
While the lighting will change depending on the room conditions, the ring light around
the camera will provide significant consistency in lighting.
While this does not not eliminate the necessity to train against various lighting
conditions, it does reduce their significance and increase certainty of the detection.
Only the components being detected need to be trained in all angles, as opposed to
the camera gathering the dataset requiring to be positioned in different angles.
Having a top to down view also eliminates the majority of issues that come with glare
from high luminosity bodies, such as clouds or the sun.
A set rig significantly limits the distance that the objects will be from the camera during
runtime, allowing for further confidence in the predictions.
Static background
Angle range
Lighting
Apart from dust or unexpected objects present on the rig's surface, which should be
removed before usage - the background that the objects are in front of will stay mostly
consistent.
This reduces the necessity to gather data of the same object under backgrounds that
are not expected to be used during runtime.
While this project may be retrained and refocused to be utilised for many different
fields - it is trained for electrical component identification, which is focused towards
engineers.
Architectures
This project focuses on both existing engineers, and ones that are interested in
becoming engineers.
Having access to the provided by the project quick identification of components, count
of each, and any potential additional information saves time spent manually analysing
this information.
Average time per epoch
2 minutes
Average time per epoch
2 minutes and 20 seconds
SIP Resistor
Singular
Acronyms
SIP
Introduction
Single Inline Package
GPU
Graphics Processing Unit
CPU
Central Processing Unit
AI
Artificial Intelligence
LDR
Light Dependant Resistor
LED
Light Emitting Diode
AC
DC
Alternating Current
Direct Current
PCB terminal
PCB
Printed Circuit Board
The most prominent color may be identified by sorting all the colors from the image
into their hue values, and checking which hue is most active.
The color codes can be identified by processing the image using filters and otherwise
until only the prominent colors remain.
These can be processed into the actual ohm value.
Then, the positions of the color codes relative to the body of the resistor can be used
to identify the specific positions and order of the color codes.
The pin count can be identified by processing the image using filters until there is a
clear contrast between the body of the chip, and the pins.
One approach that could help identify the number of pins would be drawing a line
between two of the pins and seeing how many of the pins touch this line. Taking the
line that touches the most pins would provide the pin count of this IC.
OCR may be used
OCR
Optical Character Recognition
Software based reading alphabetical characters from an image that contains written
text.
Input image
Inference method
Algorithm method
Different color LEDs may be trained as individual classes.
Has the disadvantage of requiring training for each individual LED separately, as
opposed to one generic LED.
Has the advantage of working on any LED.
Raw input
High contrast filter
Colors histogram
Approaches after filtering
Clearly prominent yellow
Has the disadvantage of potentially giving false information if the background is too
vibrant.
Contrast approach
HSV
Taking the average of all the pixels hue values that have a value above a certain
threshold. Around 0.7 on a range from 0 to 1 should be appropriate.
Hue is in the range of 0 to 360 degrees.
The pink dots represent the pixel values obtained from the previous step.
Taking the average of this data, the result would land in the degree value that can be
easily determined as yellow, by separating the hue circle into sections of colors by
degrees ranges.
HSV, or Hue Saturation Value, is an alternative way to represent colors.
It can be advantageous over RGB in situations such as this.
RGB
Red Green Blue
Commonly used to referred to a way of defining colors by their Red, Green and Blue
properties.
HSV
Hue Saturation Value
Commonly used to referred to a way of defining colors by their Hue, Saturation and
Value properties.
Yellow is between 72° and 108° degrees on the hue circle.
Note: This example would ignore colors that are darker than 0.7, on a range of 0 to 1.
The ability to take a snapshot of the current frame, defining appropriate labels, and
saving this labeled snapshot for future training. All from inside the GUI.
Alternatively, taking snapshots of the GUI and saving them for later labeling.
Sorted from highest priority, to lowest.
Setting up the camera on a rig
Base GUI
GUI with essentials to interface the camera through USB, with
A live display from the camera on the rig.
Ability to take images by pressing a button.
Support for running Inference.
~100 images of a single class, taken from the rig for initial training and testing of the
model.
Initial dataset gathering
For the purpose of testing inference on the rig.
Proof of concept. The results will not be perfect as the dataset is minimal, and only
contains 1 class.
Further dataset gathering
At least 250 pictures of each class of every component that the project is designed to
detect.
Furter model training
This training will take considerably longer than the initial training. Around 2 minutes
per epoch, and should be ran for at least 300 epochs.
The initial training should not take long at all, and does not require to be polished.
Training for ~100 epochs should be sufficient, with each epoch taking ~20 seconds on
the machine available.
Rig
Model Training via Deep Learning
Machine used
Personal Computer
CPU
GPU
AMD RyzenTM 7 5800X3D
Core count
8
Base clock frequency
3.4GHz
L3 Cache
96MB
Maximum operating temperature
90°C
Thread count
16
GeForce RTX 3060 Ti
Memory
8192MB
CUDA core count
4864
Capacity
Type
GDDR6X
Base clock frequency
1.41GHz
The goal is to reach 0.8 from range of 0 to 1 confidence values.
Ability to gather further information from the detection bounding boxes provided by the
inference.
After the previous steps are in good shape, investigation of moving the inference to a
mobile device will begin.
If the confidence values are not up to standard, more data will be gathered from this
and potentially other mobile devices, and further training will follow, until the results
are adequate.
If adequate results are achieved before the deadline of this project, deployment to a
mobile device will be started.
If the frame rates are not sufficient enough, the inference may be ran on still images to
improve user experience.
Optional: Ability to label the images from the device, without requiring external
software.
It may be advantageous given the timeframe of the project to instead gather data
during a session and labelling it afterwards.
Memory
Capacity
Type
2x16GB
DDR4
Frequency
3.6GHz
Brand
Corsair
Name
Vengeance RGB PRO SL
Link
https://www.corsair.com/eu/en/Categories/Products/Memory/Vengeance-RGB-PRO-
SL-Black/p/CMH32GX4M2E3200C16
Brand
AMD
Name
Ryzen 7 58700X3D
Link
https://www.amd.com/en/products/cpu/amd-ryzen-7-5800x3d
Brand
NVIDIA
Series
30
Name
RTX 3060Ti
Link
https://www.nvidia.com/en-gb/geforce/graphics-cards/30-series/rtx-3060-3060ti/
CUDA
Special cores that are designed for compute-intensive tasks.
These run parallel with the CPU, and may also run parallel with multiple GPUs.
They are perfect for deep learning, as deep learning is incredibly compute intensive.
Deep learning training times are predictable, and stay mostly constant between
epochs.
This means that there are no race conditions, and the more processing power
available, the quicker the epoch will finish.
Each of these steps should be polished before continuing to the next one, to provide a
solid foundation for the next step to be based on.
Analysis
Brief History
YOLO, which stands for You Only Look Once is a popular image segmentation and
object detection model that was originally developed by Joseph Redmon and Ali
Farhadi.
The first version was released in 2015, and it very quickly became popular due to the
significantly superior speed and accuracy when compared to other architectures.
YOLOv1
YOLOv4
Released in 2018, Introducting of Mosaic data augmentation, and a new and improved
loss function - decreasing time taken to achieve better results for the trained model.
YOLOv5
Released in 2020, Introducing support for Object Tracking - which allows following a
moving object, and Panoptic Segmentation, which allows identification of overlapping
objects, with accurate bounding boxes.
Ultralytics YOLOv8
The latest version of YOLO as of today. YOLOv8 is a state-of-the-art model that builds
upon the already very successful previous YOLO versions, introducing new
performance and flexibility features.
Full support for previous YOLO versions, making it incredibly convenient for existing
users of previous YOLO versions to take advantage of the new features.
Versions
Comparison
In general, YOLOv8 is superior to all of its predecessors.
While YOLOv5 is mostly underperforming when compared to the next versions, it is
important to note how incredibly minimal the delays are even on a version so outdated
now.
YOLO offers pretrained models that are used to start train custom models.
Each model has its advantages and disadvantages, and should be picked depending
on the project.
Size
mAP single-model single-scale values while detecting on the COCO val2017 dataset.
Speed
Averaged time taken using the Amazon EC2 P4d instance on the COCO dataset.
The pixel height and width the model operates up to.
Params (In Millions)
The number of parameters that are tweaked per epoch while training, and processed
during inference.
FLOPS
Floating Point Operations Per Second
A measure based on Floating Point Operations that is relevant in the field of Deep
Learning.
Diminishing results can be observed on the mAP values when compared to the time
taken (Speed).
Model properties
In some circumstances, max precision is essential, and is prioritised over the
hardware requirements. This is when a higher model should be chosen.
In the scope of this project - the YOLOv8m model has been chosen.
The morale behind this choice is to take the advantage of the high mAP value, while
not exceeding the time taken too much, in preparation for a future mobile deployment
of the model.
In comparison of YOLOv5 and YOLOv8 versions - a clear advantage can be seen
when taking into account the size of the model (param count), and the resulting mAP
output, as well as the time taken.
Architecture choice
YOLO has been chosen as the architecture that this project utilises for the AI
detection.
At the start of the project, there was already a high bias towards YOLO due to the
highly positive past experience with YOLOv5 and all the incredible features that it
offers.
Upon release of YOLOv8 and all the superior features and specifications that it
provides on top of the previous versions - YOLOv8 was an obvious choice in the
architecture that will be used for the project.
Description
As the name suggests, YOLO focuses on detection of multiple classes in a single
"look", which is a single analysis of the entire input image.
When compared to many other architectures before YOLO, realistically, no matter how
quick the other architectures may be - this is an incredibly superior approach, as other
architectures would approach detection by reanalysing the entire image for every
single class that the model was trained for - increasing the time taken per detection
additively per class.
An approach like this may seem too good to be true, and that it should come with
signficant cost to the speed and confidence of the model.
But when the results are analysed - that could barely be further from the truth.
YOLO is an incredibly efficient and accurate architecture.
These days most sophisticated architectures approach object detection similarly to
YOLO, but YOLO is still a state-of-the-art architecture that continues to improve and
grow to this day.
Internal AI Object Detection steps
Classification
Object Detection
Segmentation
The process of identifying the exact bounding box of the item detected.
The Bounding by a box of the classified segments of the image.
The identification of a part of an image believe to contain an item of a class the model
was trained to detect.
Visual examples
Resizing
Joining up of multiple images to create new ones
The reduction of data required to train makes it feasible to train relatively high quality
models from data gathered and trained from home.
Marking Codes
Hardware
Raspberry Pi
Beaglebone
Nvidia Jetson Nano
Intel Neural Compute Stick 2
Specifications
Processor Base Frequency
700MHz
Memory
2GB
Specifications
Core Count (GPU)
128
GPU Max Frequency
921MHz
Core Count (SHAVE)
16
Advantage
Offers computational power through a USB connection - can be used to run Inference
on existing devices, such as a laptop.
Specifications
Core Count (GPU)
2
GPU Max Frequency
532MHz
Specifications
Core Count (GPU)
4
GPU Max Frequency
700MHz
Type
Standalone
Type
Standalone
Type
Standalone
Type
Extension
Resistors and Inductors
Capacitors
ICs
Color coded
Number coded
Android Phone
Specifications depend on the specific device
Benefits
Widely and easily accessible
On average, superior than the alternatives.
Has a built-in camera that is considerably more convenient than the alternatives.
YOLO
You Only Look Once
An image detection architecture that the project is based on.
CUDA cores provided by the GPU
CPU
Inference
Training
Personal Computer
Rented Dedicated Server
Advantages
Disadvantages
Advantages
Disadvantages
Local - Provided a local machine is already owned, it is immediately available.
Utilises multiple GPUs - Quicker epoch computations, resulting in quicker training.
Cloud based
Allows for parallel computing, as opposed to using your personal computer at home.
Cloud based - upload and download times
Datasets tend to be considerably big in size.
A smaller dataset of ~2000 images takes up ~3gb of space.
This is not a significant amount of data for a local machine to transfer, but it is a
considerable amount for uploading.
Cost
The bigger the server - the higher the rates become.
Cost
As opposed to a rented server - acquiring your own machine has the benefit of owning
the machine, and being able to use it indefinitely (Or until it eventually breaks.)
While the initial cost of acquiring an adequate machine for deep learning is higher than
renting a server for a few months, it is a worthwhile long-term investment into a
machine that can be used for a variety of casual or intensive tasks.
Setup time
Setup time
Speed
Speed
When compared to a sophisticated server that runs many GPUs - a local machine will
most likely process the training at a slower rate than a dedicated server would.
A local machine will likely contain one, maybe two GPUs.
Pictures are taken from the machine itself. No upload/download times.
Devices
Discussion
When compared to training - usage of the trained model to run inference is
considerably quicker
R-CNN
Description
Disadvantages
Not real-time.
On average, takes 47 seconds to process a single frame.
Discussion
It should be noted that R-CNN has a successor called Fast R-CNN and Faster R-
CNN.
However, even the fastest of the choices still barely manages 5 frames a second at
best.
R-CNN, which stands for Region Based Convolutional Neural Networks was released
in 2013. As other object detection architectures, R-CNN takes an input image, and
outlines bounding boxes where it believes an item of a certain class is present.
While 5 frames a second is an impressive and definitely useable result, there are
alternative architectures that offer a significant improvement in inference time.
Developed by Ross Girshick
SSD
Description
SSD, which stands for Single Shot Detector. SSD was released in 2017
Developed mostly by Max deGroot and Ellis Brown
Discussion
Offers great framerates of an average of 45 frames per second when tested on a
relatively old now graphics card NVIDIA GTX 1060.
Disadvantages
According to the Git repository, the project was seemingly abandoned about 4 years
ago.
According to the Git repository, the project was seemingly abandoned about 5 years
ago.
Discussion
One of the most feature-rich, cutting-edge, state-of-the-art and popular architectures
that is in use today.
The component of a computer where the core computations are processed.
An optional component of a computer that is dedicated and optimised in computing
graphical tasks.
Existing labelling related software offers quality of life features, such as rough auto
labelling of the images, which only requires the user to adjust the bounding boxes and
confirm their validity, rather than having to define the boxes from start to finish.
Inference Example
Inference Example
Discussion
Surprisingly good results for a model trained from 120 images, with confidence values
above 0.8 and sometimes over 0.9!
Pretrained model used
yolov5s
Architecture
YOLOv5
Architecture
Pretrained model used
yolov5m
YOLOv5
Architecture
Pretrained model used
yolov5m
YOLOv5
Discussion
Rather poor results. Confidence values usually below 0.7, struggled to classify
accurately.
Discussion
Great results with confidence values consistently above 0.8, classifying all classes
accurately!
Technology utilised
Deep learning computation with CPU Cores and GPU CUDA Cores running in parallel.
220Ohm resistor example
Color codes
Red = 2
Brown = 1
Gold = 5% tolerance
100nF capacitor example
Unfortunately for the purposes of automatic identification of Integrated Circuit
markings, most IC manufacturers do not follow any global standard for marking their
ICs.
Most manufacturers tend to have their own internal IC marking standards.
Due to this fact - only known markings can be used to identify components.
Mixed manufacturer ICs example
This example illustrates the vast variation and lack of identifiable without access to
datasheets markings.
Architecture
YOLOv5
Architecture
A silicon board that has parts of it etched away, with only conductive tracks remaining
in specific positions that are pre-planned using a CAD software.
Widely used to implement electronic circuits.
CAD
Computer Aided Design
CAD software accelerates and automates designs in various different fields.
Instructions are given to a computer that are translated into more complex and
intuitive, usually GUI based interactive programs.
Electrical current that oscillates.
Electrical current that stays constant.
An electrical component that emits light when current is passed through the circuit.
A resistor that varies in resistance relatively to the amount of light the body of the
component is exposed to.
A ring light has been added for both training and inference running.
Historical
Issues encountered
A glitch in augmentation provided by YOLOv5, where rotation during augmentation
has shifted the bounding boxes of the components, causing inaccurate feedback to
the model, preventing it from training appropriately.
Actual bounding boxes after rotation augmentation
Note the unnecessarily expanded bounding boxes.
Description
Submitted GitHub issue
Link
https://github.com/ultralytics/yolov5/issues/10639
Information gathered from replies as of todays date
This issue has been reported to be part of YOLOv7 augmentation also.
Example
Expected bounding boxes after rotation augmentation
Note the snug fit of the bounding box around the edges of the component.
That is desirable, as it provides accurate information on what the model should be
looking for.
This will train the model in undesirable ways, detecting parts it should not.
Augmentation rotation issue
Software
Description
Epochs
Augmentation
Training
Description
Pretrained models
Loss function
train vs val
labels
Created With
MindMaster